Rewrite aggregation processing to be more efficient #223

danielkza · 2016-07-22T19:03:21Z

Previously aggregation worked by traversing the modules tree in
pre-order. But to ensure that children are aggregated before their
parents, we can relax that order a bit to just processes all the results
on the same level before all of those on a level above it (the topmost
level consisting of the root).

This allows fetching much more results at once and significantly reduce
the number of trips to the database - from a number proportional to the
number of nodes, to exactly and no more than the maximum depth of the
tree.

It also makes it much easier to accumulate the created tree metric
results to be created all at once. That also saves a huge number of
trips to the database.

Regarding tests: a complete refactor was necessary, and made possible by
the module results tree factory. The tests ended up much cleaner and
arguably better, as they can verify the actual values being aggregated
while mocking only the necessary data accesses.

- Slightly refactor pre-order - Add level order, which should be faster when fetching from the database, by making one query per-level of the tree, instead of possibly one per-node

The previous implementation was incredibly sub-optimal. It can be easily replaced with a join on the parent_id.

Previously aggregation worked by traversing the modules tree in pre-order. But to ensure that children are aggregated before their parents, we can relax that order a bit to just processes all the results on the same level before all of those on a level above it (the topmost level consisting of the root). This allows fetching much more results at once and significantly reduce t he number of trips to the database - from a number proportional to the number of nodes, to exactly and no more than the maximum depth of the tree. It also makes it much easier to accumulate the created tree metric results to be created all at once. That also saves a huge number of trips to the database. Using the aggregation performance tests, in my development machine the average time - combined with the indexing changes that were previously made - went from around 200s to <20s. Regarding tests: a complete refactor was necessary, and made possible by the module results tree factory. The tests ended up much cleaner and arguably better, as they can verify the actual values being aggregated while mocking only the necessary data accesses.

It is no longer used since the new aggregation processing collects the tree metric results itself, making the auxiliary logic to find out whether a node already has a result for a metric unnecessary.

rafamanzo · 2016-07-22T20:21:00Z

I believe I've split this into: #222, #224 and #225. What do you think? If you feel like, please update their descriptions as you wish.

danielkza · 2016-07-22T21:03:54Z

Closed by splitting into #222, #224 and #225.

danielkza added 5 commits July 22, 2016 15:09

Update ModuleResult tree-walking methods

0b0d8be

- Slightly refactor pre-order - Add level order, which should be faster when fetching from the database, by making one query per-level of the tree, instead of possibly one per-node

Optimize TreeMetricResult#descendant_values

c7c1e73

The previous implementation was incredibly sub-optimal. It can be easily replaced with a join on the parent_id.

Remove unused MetricResultAggregator class

0fcb09e

It is no longer used since the new aggregation processing collects the tree metric results itself, making the auxiliary logic to find out whether a node already has a result for a metric unnecessary.

Update aggregation tests to verify all forms

8137ecc

danielkza added the in progress label Jul 22, 2016

danielkza closed this Jul 22, 2016

danielkza removed the in progress label Jul 22, 2016

danielkza deleted the optimize_aggregation branch July 26, 2016 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite aggregation processing to be more efficient #223

Rewrite aggregation processing to be more efficient #223

danielkza commented Jul 22, 2016

rafamanzo commented Jul 22, 2016 •

edited

Loading

danielkza commented Jul 22, 2016

Rewrite aggregation processing to be more efficient #223

Rewrite aggregation processing to be more efficient #223

Conversation

danielkza commented Jul 22, 2016

rafamanzo commented Jul 22, 2016 • edited Loading

danielkza commented Jul 22, 2016

rafamanzo commented Jul 22, 2016 •

edited

Loading